An MDL Estimate of the Significance of Rules

نویسندگان

  • John G. Cleary
  • Shane Legg
  • Ian H. Witten
چکیده

This paper proposes a new method for measuring the performance of models—whether decision trees or sets of rules—inferred by machine learning methods. Inspired by the minimum description length (MDL) philosophy and theoretically rooted in information theory, the new method measures the complexity of test data with respect to the model. It has been evaluated on rule sets produced by several different machine learning schemes on a large number of standard data sets. When compared with the usual percentage correct measure, it is shown to agree with it in restricted cases. However, in other more general cases taken from real data sets—for example, when rule sets make multiple or no predictions—it disagrees substantially. It is argued that the MDL measure is more reasonable in these cases. and represents a better way of assessing the significance of a rule set’s performance. The question of the complexity of the rule set itself is not addressed in the paper.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of statistical techniques and artificial neural network to estimate force from sEMG signals

This paper presents an application of design of experiments techniques to determine the optimized parameters of artificial neural network (ANN), which are used to estimate force from Electromyogram (sEMG) signals. The accuracy of ANN model is highly dependent on the network parameters settings. There are plenty of algorithms that are used to obtain the optimal ANN setting. However, to the best ...

متن کامل

Estimate Output of a Production Unit in Production Possibility Set with Fuzzy Inference Mechanism

In this paper, we consider the production possibility set with n production units such that the following four principles that governs: inclusion observations, conceivability, immensity and convexity. Our goal is to estimate the output of a same and new production unit with existing production possibility and amount of input is specified. So, initially we find the interval changes of each input...

متن کامل

A Novel DOA Estimation Approach for Unknown Coherent Source Groups with Coherent Signals

In this paper, a new combination of Minimum Description Length (MDL) or Eigenvalue Gradient Method (EGM), Joint Approximate Diagonalization of Eigenmatrices (JADE) and Modified Forward-Backward Linear Prediction (MFBLP) algorithms is proposed which determines the number of non-coherent source groups and estimates the Direction Of Arrivals (DOAs) of coherent signals in each group. First, the MDL...

متن کامل

Money Growth Rules in an Emerging Small Open Economy with an informal sector

This paper is concerned with the saddle-path stability of monetary growth rules in a two-country two-sector dynamic stochastic general equilibrium model. Alongside standard features of emerging economies, such as a combination of producer and local currency pricing for exports, fiscal dominance and oil exports, this model also incorporates informal labour and production sectors and examines how...

متن کامل

Asymptotic MAP criteria for model selection

The two most popular model selection rules in the signal processing literature have been the Akaike’s criterion AIC and the Rissanen’s principle of minimum description length MDL. These rules are similar in form in that they both consist of data and penalty terms. Their data terms are identical, but the penalties are different, the MDL being more stringent toward overparameterization. The AIC p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996